Managing diabetes is inherently a day-to-day pursuit that frequently occurs in the context of social relationships. A handful of literature have studied the impact of a social network-based intervention on managing type 2 diabetes (T2D), and have found promising results on inpatient cost reduction and life quality improvement. In this light, the Social Peer Engagement (SPE) initiative was created with the intention to leverage social interaction between participants to increase the members’ engagement in the program while improving their physiological outcomes.
With different social engagement strategies tested for this initiative, this sprint is focused on analyzing the impact of belonging to an asynchronous online support group on selected participants (SPE2 participants). In particular, this sprint explores its influence on their diabetes management – as measured by their estimated glucose value (EGV) recorded from their continuous glucose monitoring (CGM) device, and on their physical activity level – as measured by their daily logged steps from their Fitbit device. Analysis on their EGV and steps data is done in a two-fold approach: 1) SPE2 participants vs. a comparison group, and 2) SPE2 participants before and after the experiment period. To tie everything together, this sprint also analyzed survey results on SPE2 participants’ Fitbit group experience, which was done concurrently with the experiment via individual coaching call.
During the recruitment process for this asynchronous group experiment, participants were divided into three groups: Glucose All-Stars, Glucose Masters, and Glucose Champions. These members have been using the Fitbit device for some time when they were provided with a CGM device and access to a messaging forum via Fitbit’s Community feature. The CGM device is provided during the nurse visit, the earliest being on 20 Aug 2018, while the participants were placed into the groups on 23 Aug 2018. Each group, all independent of each other, has its own two coaches who prompted the discussion by posting weekly questions starting 29 Aug 2018. The participants are expected to engage in the forum via posting, leaving a comment, or giving a “cheer” to a post. This experiment went on for about eight weeks, and during which, survey questions were also asked during the participants’ one-on-one calls with their coach.
During the participant’s individual coaching call, questions were asked to learn about the participants’ experience with the Fitbit group and how it’s influencing their diabetes management. As shown below, some questions were asked on a weekly basis, and some were asked only during their fourth and eighth week:
| WEEKLY QUESTIONS |
|---|
| Q20: On a scale of 1 to 5, how are you in control of your diabetes today? |
| Q30: On a scale of 1 to 5, is the coach supporting your needs and goals? |
| Q80: On a scale of 1 to 5, how useful are the Fitbit group discussions? |
| Q78: How can we improve the group discussion? |
| MONTHLY QUESTIONS |
|---|
| Q77: On a scale of 1 to 5, how has it been for you to access the Fitbit group? |
| Q79: On a scale of 1 to 5, how likely are you to recommend the Fitbit group discussion to others? |
| Q82: On a scale of 1 to 5, how helpful were the Fitbit group discussions? |
| Q83: What aspects of the group discussions did you find helpful? |
As mentioned above, the analysis for this study runs in two-fold approach, 1) SPE2 participants vs. a comparison group, and 2) SPE2 participants before and after the experiment period.
For the first approach, the researchers established a criteria on who would be part of comparison group. That is, the possible comparison group members should have never been part of any pilot or experiment under the Glucose Management Program (GMP), and consequently, have never have been offered to be part in the SPE experiments. The selection was done thru matching the possible comparison group member to a corresponding SPE2 participant of the same gender, and closest age and risk adjustment factor (RAF) score. Since the SPE2 experiment runs for eight weeks, i.e. approximately 60 days, the data from the comparison group was also set to be in a 60 day-time frame. Their start and end dates were determined to be the first day of EGV reading until their 60th day.
Apart from selecting the comparison group, the researchers also established the main metric to be used in analyzing the EGV data, time-in-range (TIR). TIR was derived as to the times the EGV reading is within the acceptable range, which in this study is set to 80 and 180. The EGV reading is recorded in every five-minute interval where in there is a total of 288 readings in a day. There are cases where the total daily readings are less than 288 due to not wearing the CGM device or due to inappropriate syncing of the device. In other cases, there were more than 288 readings due to repeated observations, which were removed as part of the data preparation. As a result of this irregularity, TIR made use of the participants’ actual number CGM readings per day instead of the true number of daily readings, i.e. 288.
For the second approach, the researchers determined the “before” and “after” data of the participants in the experiment. As CGM devices were only given to the participants on the onset of the experiment and as the experiment period has just finished, there is technically no “before” and “after” data. For standardization purposes,the “before” and “after” are set to be the first two weeks and the last two weeks of a participant in the experiment. Setting this definition would still answer whether engaging in the Fitbit group indeed made an impact on their diabetes management and physical activity level.
This study interpreted physiological improvement in two perspectives: 1) whether SPE2 participants have better EGV level or daily step activity vis-à-vis members from comparison group, and 2) whether SPE2 participants have improved egv level or daily step activity after the experiment period compared to when they were just starting out in the Fitbit group. For both EGV and steps data, permutation test was used to address the first objective while bootstrap and Wilcoxon signed-rank test were utilized to answer the second. Using resampling methods, e.g. permutation test and bootstrap, and non-parametric test, e.g. Wilcoxon signed-rank test, allows us to make inference from the experiment’s small sample.
Permutation test tests whether the observed difference between two groups happened only by chance. It attempts to construct the null distribution, i.e. the distribution this statistic would have if the treatment effect – which in this case is belonging to the Fitbit group – were not present, by randomly permuting individuals to SPE2 or comparison group. The two-tailed p-value is then calculated as the number of times the “null” statistic is greater than or less than the observed difference.
For the EGV analysis, the test statistic used is the difference between the mean TIR of the SPE2 participants and the mean TIR of comparison group members. This is obtained by the calculating the TIR of each of the ith individual from the SPE2 group and the TIR of each of the jth individual from the comparison group. These TIRs are then averaged per group, and then differenced. If the difference is positive, then there are more instances where the EGV levels of the SPE2 participants are within acceptable range.
\[{\frac{\sum_{i=1}^{n=20} TIR\,for\,the\,whole\,period\,_i} {20}} - {\frac{\sum_{j=1}^{m=20} TIR\,for\,the\,whole\,period\,_j}{20}}\]
On the other hand, for the steps data analysis, the test statistic used is the difference of the weighted average daily steps between the comparison group and the SPE2 participants, wherein the weights used is the number of days with logged steps. If this difference is positive, then the SPE2 participants have higher average daily steps.
\[{\frac{\sum_{i=1}^{n=20} Average\,daily\,steps\,_i\,\cdot\,Number\,of\,days\,with\,data\,_i} {\sum_{i=1}^{n=20}\,\,Number\,of\,days\,with\,data\,_i }} - {\frac{\sum_{j=1}^{m=20} Average\,daily\,steps\,_j\,\cdot\,Number\,of\,days\,with\,data\,_j} {\sum_{i=1}^{n=20}\,\,Number\,of\,days\,with\,data\,_j}}\]
For both EGV and steps data analysis, permutation test was used to verify whether these observed differences are significantly different between the SPE2 group and the comparison group.
Bootstrap was primarily used for the before and after analysis of the SPE2 participants’ EGV and steps data. Bootstrap, another resampling method, attempts to construct the sampling distribution of some statistic computed from the data, and from which a confidence interval of the said statistic could be obtained. While used mainly for generating confidence intervals, it can be adapted to do hypothesis testing as with permutation test.
For the EGV analysis, the difference in the mean TIR before and after the experiment was used as the test statistic for the bootstrap. That is, for the each of the same individual, their TIR was measured twice, one of which was obtained over the first two weeks of the experiment, i.e. “before”, and the other obtained over the last two weeks, i.e. “after.” If the difference between these two TIRs is positive, then there are more instances during the latter part of the experiment where the participant have EGV levels within the acceptable range.
\[ {TIR\,after\,_i - TIR\,before\,_i}\]
Following the same structure, the steps analysis also made use of difference between the average daily steps during the first and last two weeks of the experiment. If the difference is positive, then the average daily steps of the participants is became higher compared to when they were still new to the Fitbit group.
\[{Average\,daily\,steps\,after\,_i - Average\,daily\,steps\,before\,_i}\]As bootstrap is observed to have some drawbacks on small sample sizes, e.g. it rejects the null hypothesis more than it should, Wilcoxon signed-rank test was also used. Wilcoxon test is a non-parametric hypothesis test used to compare paired samples or repeated measurements on a single sample. In this case, it would be comparing TIR and average daily steps of the same individual – again – one of which obtained over the first two weeks of the experiment, i.e. “before”, and the other obtained during the last two weeks, i.e. “after.”
Results are divided into three parts, EGV analysis, Steps analysis, and Survey analysis. Descriptive measures were shown in the EGV and Steps analysis to obtain an initial understanding of the physiological performance of SPE2 participants and comparison group as well as the “before” and “after” improvement of SPE2 participants. Descriptive measures were also shown in the survey analysis to explain the views and opinions of participants in the experiment. Results of non-parametric test and resampling methods was done to validate findings found in the descriptives.
The EGV levels of the SPE2 participants and the comparison groups seem similar. All of the individuals’ EGV levels over the set period seem to be in control which mean that most of the time their EGV readings are in range to 80 and 180. However, it can be observed in the comparison group that there are more participants (7 participants) whose EGV readings is more than 20% of the time out of range compared to SPE2 participants who only have 3 participants. Not only that, the EGV levels of the SPE2 participants also seem to be more stable, i.e. they have shorter daily bars, compared to the comparison group members.
Having considered the descriptives of EGV data for both groups, it also reasonable to validate the findings using permutation test to show if there is really a difference between the two groups. Based on the permutation test on their EGV means, it was found that the EGV levels of the participants belonging to the SPE group is better than that of the comparisons’. However, basing on the permutation test of the proportion of TIR, the two groups are equally the same. That is, they both seem to be in control with their EGV levels.
| OBSERVED DIFFERENCE | P-VALUE |
|---|---|
| -11.92111 | 0.05 |
| OBSERVED DIFFERENCE | P-VALUE |
|---|---|
| 0.0887661 | 0.951 |
In addition to the previous analysis, Figure 3 shows that SPE2 participants seem to have improved in their last two weeks of staying in the program. To show the overview of the improvement of the participants in terms of how frequent they are in-control with their EGV levels, the researchers set the categories “Always in-control”, “Often in-control”, and “Sometimes in-control”. It is observed in the below graph that there are more people (45% out of 20 participants) who are “Always in-control” after being in the program compared to their initial first two weeks (40% out of 20 participants). The number of participants who are “Often in-control” also increases, i.e. from 25%-35% out of 20 participants.
To validate the above findings, the researchers ran bootstrap and Wilcoxon signed-rank test. Based on the tests, there is no significant difference from “before” and “after” TIR of the SPE2 participants.
| OBSERVED DIFFERENCE | 95% CI |
|---|---|
| 0.0686013 | ( -0.03 , 0.16 ) |
| TEST STATISTIC | P-VALUE |
|---|---|
| 59 | 0.1564026 |
Looking into individuals’ daily steps (Figure 4 and Figure 5), there is an observed pattern for participants belonging in the SPE2 group. Their steps during the first two weeks are higher compared to the other weeks. For some participants, e.g. 23814, 30907, their daily walking steps suddenly decrease after two weeks in experiment and the proceeding weeks is consistently lower compared to the first two weeks. The other participants show gradual decrease of steps in their first two weeks, e.g. 22847, 10632. It can also be observed that most participants are considered inactive, i.e. mean daily steps is less than 5000.
There is no participant from the SPE2 experiment who has been consistent in their daily steps unlike in the comparison group. In the comparison group, most participants have somehow stable daily steps however the mean daily steps is quite low, e.g. 49967, 25780. In addition, comparison groups also show the same steps activity with SPE2 participants’.
It appears to be that the above findings is correct. Based on the permutation test, it suggests that the weighted observed mean difference of the two groups are not significantly different from each other.
| OBSERVED DIFFERENCE | P-VALUE |
|---|---|
| 987.6662 | 0.434 |
Figure 6 suggests that the above finding is true for the SPE2 participants. The percentage of participants who were active in the first two weeks declined in the last two weeks (from 25% - 10% out of 20 participants). As the percentage of active participants decreases, the percentage of the inactive participants increases. It increases twice the percentage of inactive participants from the first two weeks (from 30% - 60% out of 20 participants).
In connection to the before and after graph, the bootstrap and Wilcoxon test also suggest that there is indeed a change in their physical activity level after being in the experiment. In particular, the average daily steps of the participants significantly decreased in the last two weeks of the experiment.
| OBSERVED DIFFERENCE | 95% CI |
|---|---|
| -4304.241 | ( -6384.93 , -2197.25 ) |
| TEST STATISTIC | P-VALUE |
|---|---|
| 186 | 2.67e-05 |
The level of engagement in the Fitbit group varied across the three groups, Gluose Masters, Champions, and All-Stars, but it was in general very low, with only one member each group actively participating via posting, leaving a comment, or giving a “cheer.” Participants were also not able to consistently give feedback throughout the survey period. Based on the individual coaching calls, particularly when asked of Q78 and Q83, a few of them have not yet even accessed the Fitbit group – either due to lack of motivation or need of technical assistance – even when the experiment is already six weeks in. While the dominant feedback for these open-ended questions were “no comment,” it is believed that need of technical assistance and lack of motivation are what hinder participants to actively engage in the Fitbit group. Some also expressed that they find it hard to relate to the stories of co-participants, suggesting that participants should instead be grouped by age or that there should be more activities. In general, there seems to be no problem, however, with their coaches.
While there is an evident lack of engagement from the participants, they also have positive feedback towards the group discussion. Some have expressed that while they don’t actually engage in the discussion, knowing that they have shared experiences and similar goals with other people makes them feel they are not alone and helps them get going with their goals too. An instance in one group even showed that the group has convinced one participant to avoid brining sweets to work. As shown also by the weekly question Q20, participants seem to feel more in control of their diabetes as they stay longer in the program. Participants also become more sure of their perception towards the usefulness of the Fitbit group discussions, i.e. Q80, as the weeks go by, as shown by the gradual increase of individuals having positive sentiment. Overall, while there is low participation among the members, a lot have also expressed positive feedback from having shared experiences and and mutual responsibilities with their co-participants.
This sprint intended to answer whether belonging to an asynchronous online support group has an influence on the participants’ diabetes management as measured by their physiological outcomes – their EGV level and steps activity. This study recognized physiological improvement in two perspectives: 1) improvement of SPE2 participants vs. a comparison group matched by demographics and RAF score, and 2) improvement of SPE2 participants over time, i.e. before and after the experiment.
Results of the experiment show that the participants’ engagement in the Fitbit group were very low, asummed to be primarily due to their need of technical assistance and lack of motivation. Participants were also not able to consistently give feedback throughout the survey period. Nonetheless, in times where participants have responses, there are some who expressed positive feedback towards the group discussions, saying that knowing they have shared experiences with others have made them feel not alone, and in one particular “Glucose” group, have even made them develop a sense of mutual responsibility.
With regards to the EGV level and steps activity analysis, initial observation on the EGV level of the SPE2 participants have been promisingly good. Compared to the comparison group, not only that their time out of range, i.e. proportion where EGV reading is outside the acceptable range, is noticeably fewer, but their EGV level is also more stable throughout the 60-day period. There are also more participants who felt and became more in-control of their diabetes, as measured by their before and after TIR, and as revealed by the survey results. Unfortunately, further testing showed that there is no significant difference in the average TIR of the SPE2 participants compared to the comparison group. Neither there is any difference too in their TIR before and after the experiment. On the other hand, an interesting pattern was observed in the SPE2 participants’ activity level – their daily steps were evidently higher during their first two weeks compared to all the other weeks. Further testing also supports that there is indeed a significant decrease in the participants’ average daily steps from when they were just new to the Fitbit group. Other than that, and as confirmed by the permutation test results, their (weighted) average daily steps are similar than that of the comparison group.
In general, while there is low engagement from the SPE2 participants in the Fitbit group, it is believed that some of them have genuinely appreciated their stay in the group. It took a while for them to be certain of their perception on the usefulness of the group discussions, but over time they felt more in control of their diabetes as they stay longer in the program. Unfortunately, this sentiment was not translated well enough to be seen in their EGV levels and steps activity.